feat: implement voice mode improvements with continuous loop, audio error recovery, and TTS discovery#459
Conversation
…rror recovery, and TTS discovery - Add voice mode state machine: Recording → Processing → Waiting → Generating → Playing → Recording loop - Add audio error recovery with blob retention, retry, and discard - Add audio cues (mic-on.wav, mic-off.wav) for recording state transitions - Add TTS discovery prompt for first-time users on supported platforms - Extend RecordingOverlay with 6 voice mode states (recording, processing, error, waiting, generating, playing) - Update TTSContext with isGenerating, speakAndWait, cancelGeneration, and sequence ID tracking - Update TTSButton with generating animation state - Add voice mode exit on chat switch, new chat, and manual stop - Highlight mic button when voice mode is active Closes #458 Co-Authored-By: marks <markskram@protonmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: marks <markskram@protonmail.com>
Deploying maple with
|
| Latest commit: |
fd60b55
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://ec93ca45.maple-ca8.pages.dev |
| Branch Preview URL: | https://devin-1772755465-voice-mode.maple-ca8.pages.dev |
…VoiceDiscard - Add recordingStartTimeRef to track when recording starts (RecordRTC has no startTime property) - Add startRecordingRef to avoid stale closure in handleVoiceDiscard's empty dependency array - capturedDuration now correctly reflects actual recording time instead of always being 0 Co-Authored-By: marks <markskram@protonmail.com>
… for savedDuration - Replace all 3 startRecording() calls in voice continuation effect with startRecordingRef.current() - Change recordingDuration || undefined to recordingDuration ?? undefined (both inputs) to correctly show 0-second durations in error UI Co-Authored-By: marks <markskram@protonmail.com>
…g on empty blob - isTTSPlatform now checks isTauri() && isIOS() to match TTSContext behavior - Empty blob in voice mode now calls startRecordingRef.current() to restart Co-Authored-By: marks <markskram@protonmail.com>
…ssage/handleTTSDiscovery - exitVoiceMode: check recorderRef.current instead of isRecording state to avoid stale closure in event handler effect - Add handleSendMessageRef and handleTTSDiscoveryRef to prevent stale closures in transcribeAndSend - transcribeAndSend now calls through refs for latest handleSendMessage and handleTTSDiscovery Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
…e to avoid stale closure These functions are idempotent, so guarding on ttsIsGenerating/ttsIsPlaying was unnecessary and caused stale closures when exitVoiceMode was captured in the event listener effect. Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
…on generation error - Set voiceState(null) after successful transcription in non-voice mode to dismiss overlay - Add errorRef to track error state without stale closures - Voice continuation effect checks errorRef.current and exits voice mode on error instead of speaking stale assistant message Co-Authored-By: marks <markskram@protonmail.com>
|
✅ TestFlight deployment completed successfully! |
…de exits Without re-throwing, speakAndWait always resolves normally even on TTS failure, causing the voice mode continuation effect's .catch() to never fire and creating an infinite loop: TTS fails → recording restarts → repeat. Co-Authored-By: marks <markskram@protonmail.com>
…n TTSButton speak() is used by TTSButton which doesn't have try/catch. Only speakAndWait (used by voice mode loop) needs error propagation to exit on TTS failure. Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
|
✅ TestFlight deployment completed successfully! |
Devin Review round 12:
- TTSContext: store audioContextRef immediately after AudioContext creation
so stopPlayback() can close it if decodeAudioData or other operations
between creation and the old ref assignment throw.
- TTSContext: throw Error('no_speakable_text') instead of silently returning
when preprocessTextForTTS strips all content (e.g. code-only responses).
- UnifiedChat: catch 'no_speakable_text' in both voice continuation effects
and restart recording instead of exiting voice mode, so the user gets
audio feedback (mic-on cue) rather than a silent mic activation.
Co-Authored-By: marks <markskram@protonmail.com>
…extRef cleanup Devin Review round 13: - UnifiedChat: in startRecording's catch block, call exitVoiceMode() when voice mode is active so the UI doesn't show 'Recording' overlay when no recording is actually happening (mic permission denied, device busy, etc.) - TTSContext: only null audioContextRef.current in the staleness check if it still points to this call's AudioContext, preventing a concurrent speakInternal call's ref from being clobbered. Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
|
✅ TestFlight deployment completed successfully! |
…, compact layout gap, TTS auto-play after download Bug 1: Audio cues not playing on iOS - AudioContext starts in 'suspended' state on iOS; added ctx.resume() before fetching and playing the wav file. Bug 2: Previous recording duration flashes on re-entry - setDuration(0) in useEffect fires after the first render with stale state. Added synchronous state reset during render when effectiveState transitions to 'recording'. Bug 3: Compact playback overlay layout imbalanced - waveform too near top, status text too near bottom. Reduced gap from gap-6 to gap-2 in compact mode. Bug 4: TTS auto-play after model download not triggering - when ttsStatus transitions to 'ready' mid-voice-loop, the user may already be recording. Now stops active recording before transitioning to 'generating' state. Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
…c decoding window Co-Authored-By: marks <markskram@protonmail.com>
… on iOS
- playAudioCue now returns a Promise that resolves when playback ends
- startRecording: await playAudioCue('mic-on') BEFORE getUserMedia
- stopRecording: stop mic stream tracks BEFORE calling transcribeAndSend (which plays mic-off)
- Set navigator.audioSession.type = 'playback' to bypass iOS silent switch (Safari 17+)
Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
|
✅ TestFlight deployment completed successfully! |
The 'playback' session type is incompatible with getUserMedia on iOS. Now saves the previous session type before setting 'playback' and restores it in onended/catch/error paths so mic activation works. Co-Authored-By: marks <markskram@protonmail.com>
'playback' session type caused two issues: 1. Blocked getUserMedia with InvalidStateError 2. Second mic-on cue was silent after a recording cycle 'play-and-record' bypasses the iOS silent switch AND is compatible with mic capture, eliminating the need for save/restore logic. Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
The WAV files are now at the correct volume from the source. Added ?v=2 query param to bypass mobile Safari's cached old files. Co-Authored-By: marks <markskram@protonmail.com>
…bust mobile Safari cache Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
|
✅ TestFlight deployment completed successfully! |
…e debug buttons Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
Co-Authored-By: marks <markskram@protonmail.com>
|
@TestFlight build |
|
🚀 TestFlight deployment triggered! Check the Actions tab for progress. |
|
✅ TestFlight deployment completed successfully! |
…oice loop Co-Authored-By: marks <markskram@protonmail.com>
Voice Mode Improvements
Summary
Implements the maple-voice-improvements spec with three main features:
1. Voice Mode — Continuous Loop State Machine
When a user starts recording on a TTS-capable platform (desktop/iOS), the app enters voice mode: a hands-free loop of
Recording → Processing → Waiting → Generating → Playing → (500ms pause) → Recording. The mic button highlights when voice mode is active and acts as an exit button. Exit is also triggered on chat switch, new chat, or any overlay X button.2. Audio Error Recovery
On transcription failure, the audio blob is retained in memory. The overlay shows an error state with the original recording duration, a Retry button (re-sends the same blob), and a Discard button. This works both inside and outside voice mode.
3. TTS Discovery Prompt & Enhanced Feedback
mic-on.wav(gentle ascending tone) when recording starts andmic-off.wav(confirmation tone) after sending successfully.Updates since last revision
Audio cue ordering & iOS silent switch fix
Root cause confirmed via debug buttons: the
playAudioCuefunction itself works fine on mobile Safari — the issue was that iOS switches to aplay-and-recordaudio session whengetUserMediaactivates the microphone, which mutes/interrupts any in-progress Web Audio playback.Fixes:
playAudioCuenow returns aPromisethat resolves when the sound finishes playing (or immediately on error), enabling proper sequencingawait playAudioCue("mic-on")is called beforegetUserMedia()instartRecording, so the sound completes before iOS switches audio sessionsstopRecording, stream tracks are released before callingtranscribeAndSend(which plays mic-off), so the audio session is free for playbacknavigator.audioSession.type = 'playback'(Safari 17+) before playing each cue, so audio plays regardless of the physical silent switch positioncancelGeneration()now callsstopPlayback(): Closes the AudioContext to prevent audio from playing if cancellation happens during the asyncarrayBuffer()/decodeAudioData()window (Devin Review fix)Debug buttons (temporary — to be removed)
Two test buttons ("Mic On" green, "Mic Off" red) are rendered above the input box for the user to verify audio cue playback on mobile Safari. These are explicitly temporary and will be removed in a follow-up change once audio cue behavior is confirmed end-to-end.
Earlier updates
TestFlight Bug Fixes (round 2)
"suspended"state. Addedctx.resume()before fetching and decoding the WAV file. Without this,source.start(0)silently fails on iOS because the context never transitions to"running".setDuration(0)insideuseEffectfires after the first render with stale state, causing a one-frame flash of the previous duration. Added a synchronous state reset during render (comparingprevEffectiveStateRef) that runs before React paints, eliminating the flash.gap-6spacing. Reduced togap-2in compact mode (isCompact ? "gap-2" : "gap-6") for a tighter, more balanced layout.ttsStatustransitions to"ready"mid-voice-loop, the user is already back in recording state with the mic active. TheprevTtsStatusRefeffect now stops the active recording (cleans up recorder and stream) before transitioning to"generating"state and callingspeakAndWait.Devin Review rounds 11–13
setDuration(0)at the start of the recording effect inRecordingOverlay.tsxto prevent a one-frame flash of the previous recording's stale duration beforerequestAnimationFrameresets it.playAudioCue("mic-on")from all 4 voice continuation caller sites inUnifiedChat.tsx.startRecording()is now the single source of truth for the mic-on cue, eliminating the stutter/echo caused by double playback.audioContextRef.currentis now assigned immediately afternew AudioContext()creation (beforedecodeAudioData,resume, etc.) sostopPlayback()can close it if any subsequent operation throws. Previously, the ref was set after several async operations, leaving orphaned contexts on failure.speakInternalnow throwsError("no_speakable_text")(instead of silently returning) whenpreprocessTextForTTSstrips all content (e.g., code-only assistant responses). Both voice continuation catch blocks handle this by restarting recording with audio feedback, rather than silently activating the mic.startRecording's catch block now callsexitVoiceMode()when voice mode is active, preventing a stuck "Recording" overlay when the microphone is unavailable (permission denied, device busy, etc.).audioContext.resume()now only nullsaudioContextRef.currentif it still points to this call'sAudioContext, preventing a concurrentspeakInternalcall's ref from being clobbered.TestFlight Bug Fixes (round 1)
prevTtsStatusRefeffect that watches forttsStatustransitioning from non-ready →"ready"while voice mode is active. When the user downloads a TTS model mid-voice-loop, this retroactively speaks the last assistant message instead of silently skipping TTS.isCompact={true}) previously hid all status content and waveform — showing only a black overlay with an X button during TTS playback. Now shows status text (Playing, Generating, Waiting, Error) and animated waveform in compact mode for non-recording states.new Audio('/audio/file.wav')with Web Audio API (AudioContext+fetch+decodeAudioData+BufferSource).HTMLAudioElement.play()is unreliable in iOS Tauri WebView due to autoplay restrictions.startRecordingRef.current()after empty blob recovery now usessetTimeout(0)to defer until React'ssetIsRecording(false)batch commits.Earlier fixes (rounds 1–9)
setVoiceState(null)after successful transcription in non-voice mode so the overlay dismisses.errorRef.currentbefore proceeding with TTS. IfhandleSendMessagefailed, voice mode exits gracefully instead of speaking stale messages.speakInternalre-throws errors after handling them locally, allowingspeakAndWait's.catch()to callexitVoiceMode().speak()wrapper (used by TTSButton) now catches errors fromspeakInternalto prevent unhandled promise rejections. OnlyspeakAndWait(voice mode) propagates errors.cancelTTSGeneration()andstopTTS()unconditionally instead of guarding on stalettsIsGenerating/ttsIsPlaying.recordingStartTimeReffor accurate duration capture,startRecordingReffor stale closure inhandleVoiceDiscard,handleSendMessageRef/handleTTSDiscoveryReffor stale closures,isTauri()guard on iOS TTS platform check,??for 0-second duration display, recording restart on empty blob, mic leak fix.Review & Testing Checklist for Human
await playAudioCue()blocksstartRecordinguntil sound finishes — verify no hang on iOS; (2)navigator.audioSession.type = 'playback'may conflict with subsequentgetUserMedia's audio session switch; (3) debug buttons are temporary and should NOT be merged; (4) mic stream cleanup moved earlier instopRecording— ensure transcription still works.source.onendedis firing.cancelGeneration()behavior: During TTS generation in voice mode, tap the speaker icon to cancel. Verify no audio plays after cancellation (even if decoding was in progress).Notes
TTSContext.tsx:cancelGeneration()now callsstopPlayback()to close AudioContext during async decoding windowUnifiedChat.tsx:playAudioCue()returns Promise,startRecordingawaits mic-on cue beforegetUserMedia,stopRecordingstops stream before mic-off cue,navigator.audioSession.type = 'playback'set before each cue playback, DEBUG BUTTONS added above input (temporary)frontend/public/audio/mic-on.wavandmic-off.wav(both 394888 bytes, confirmed served)navigator.audioSessionAPI (Safari 17+); degrades gracefully on older browsersLink to Devin Session: https://app.devin.ai/sessions/0c853f0e1ba84474971875a61f616769
Requested by: @marksftw